{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"right\" style=\"max-width: 200px; height: auto\" src=\"cfds_logo.png\">\n",
    "\n",
    "###  Lab 08 - \"Deep Learning - Artificial Neural Networks\"\n",
    "\n",
    "Chartered Financial Data Scientist (CFDS), Spring Term 2020"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the last lab you learned about how to utilize two **unsupervised** machine learning technique namely (1) the **k-Means Clustering** and (2) the **Expectation Maximization (EM) Clustering** algorithm to cluster the features (petal length, petal width, sepal length, and sepal width) of distinct observations of iris flowers. \n",
    "\n",
    "In this lab, we will learn how to implement, train, and apply our first **Artificial Neural Network (ANN)** using a Python library named **'PyTorch'**. PyTorch is an open-source machine learning library for Python, used for a variety of applications such as image classification and natural language processing.\n",
    "\n",
    "We will use the implemented neural network to classify images of **handwritten digits** (i.e., data without defined categories or groups). The figure below illustrates a high-level view on the machine learning process we aim to establish in this lab."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"center\" style=\"max-width: 700px\" src=\"classification.png\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As always, pls. don't hesitate to ask all your questions either during the lab or send us an email (using our\n",
    "fds.ai email addresses)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Lab Objectives:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After today's lab, you should be able to:\n",
    "\n",
    "> 1. Understand the basic concepts, intuitions and major building blocks of **Artificial Neural Networks (ANNs)**.\n",
    "> 2. Know how to use Python's **PyTorch library** to train and evaluate neural network based models.\n",
    "> 3. Understand how to apply neural networks to **classify images** of handwritten digits.\n",
    "> 4. Know how to **interpret the detection results** of the network as well as its **reconstruction loss**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup of the Jupyter Notebook Environment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similar to the previous labs, we need to import a couple of Python libraries that allow for data analysis and data visualization. We will mostly use the `PyTorch`, `Numpy`, `Sklearn`, `Matplotlib`, `Seaborn` and a few utility libraries throughout this lab:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import standard python libraries\n",
    "import os\n",
    "from datetime import datetime\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import the Python machine / deep learning libraries:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import the PyTorch deep learning libary\n",
    "import torch, torchvision\n",
    "import torch.nn.functional as F\n",
    "from torch import nn, optim\n",
    "from torch.autograd import Variable"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import the sklearn classification metrics:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import sklearn classification evaluation library\n",
    "from sklearn import metrics\n",
    "from sklearn.metrics import classification_report, confusion_matrix"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import Python plotting libraries:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import matplotlib, seaborn, and PIL data visualization libary\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "from PIL import Image"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Enable notebook matplotlib inline plotting:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a structure of notebook sub-directories to store the data as well as the trained neural network models:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not os.path.exists('./data'): os.makedirs('./data')  # create data directory\n",
    "if not os.path.exists('./models'): os.makedirs('./models')  # create trained models directory"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Dataset Download and Data Assessment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The **MNIST database** (**M**odified **N**ational **I**nstitute of **S**tandards and **T**echnology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is widely used for training and testing in the field of machine learning. Let's have a brief look into a couple of sample images contained in the dataset:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"center\" style=\"max-width: 500px; height: 300px\" src=\"mnist.png\">\n",
    "\n",
    "(Source: https://en.wikipedia.org/wiki/MNIST_database)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Further details on the dataset can be obtained via: *LeCun, Y., 1998. \"The MNIST database of handwritten digits\", ( http://yann.lecun.com/exdb/mnist/ ).\"*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The MNIST database contains **60,000 training images** and **10,000 evaluation images**. The size of each image is 28 by 28 pixels. The handwritten digits contained in each fixe-sized image have been size-normalized and centred. The MNIST dataset is a great dataset to start with when learning about machine learning techniques and pattern recognition methods on real-world data. It requires minimal efforts on preprocessing and formatting the distinct images."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's download, transform and inspect the training images of the dataset. Therefore, let's first define the directory in which we aim to store the training data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_path = './data/train_mnist'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's download the training data accordingly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define pytorch transformation into tensor format\n",
    "transf = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])\n",
    "\n",
    "# download and transform training images\n",
    "mnist_train_data = torchvision.datasets.MNIST(root=train_path, train=True, transform=transf, download=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Verify the number of training images downloaded:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# determine the number of training data images\n",
    "len(mnist_train_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Furthermore, let's inspect a couple of the downloaded training images:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# select and set a (random) image id\n",
    "image_id = 2000\n",
    "\n",
    "# retrieve image exhibiting the image id\n",
    "mnist_train_data[image_id]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, that doesn't seem right :). Let's now seperate the image from its label information:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mnist_train_image, mnist_train_label = mnist_train_data[image_id]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Great, let's now visually inspect our sample image: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define tensor to image transformation\n",
    "trans = torchvision.transforms.ToPILImage()\n",
    "\n",
    "# set image plot title \n",
    "plt.title('Example: {}, Label: {}'.format(str(image_id), str(mnist_train_label)))\n",
    "\n",
    "# plot mnist handwritten digit sample\n",
    "plt.imshow(trans(mnist_train_image), cmap='gray')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Fantastic, right? Let's now define the directory in which we aim to store the evaluation data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eval_path = './data/eval_mnist'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And download the evaluation data accordingly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define pytorch transformation into tensor format\n",
    "transf = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])\n",
    "\n",
    "# download and transform training images\n",
    "mnist_eval_data = torchvision.datasets.MNIST(root=eval_path, train=False, transform=transf, download=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's also verify the number of evaluation images downloaded:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# determine the number of evaluation data images\n",
    "len(mnist_eval_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Neural Network Implementation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section we, will implement the architecture of the **neural network** we aim to utilize to learn a model that is capable to classify the 28x28 pixel MNIST images of handwritten digits. However, before we start the implementation let's briefly revisit the process to be established. The following cartoon provides a birds-eye view:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"center\" style=\"max-width: 900px\" src=\"process.png\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.1 Implementation of the Neural Network Architecture"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The neural network, which we name **'MNISTNet'** consists of three **fully-connected layers** (including an “input layer” and two hidden layers). Furthermore, the **MNISTNet** should encompass the following number of neurons per layer: 100 (layer 1), 50 (layer 2) and 10 (layer 3). Meaning the first layer consists of 100 neurons, the second layer of 50 neurons and third layer of 10 neurons (the number of digit classes we aim to classify."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will now start implementing the network architecture as a separate Python class. Implementing the network architectures as a **separate class** in Python is good practice in deep learning projects. It will allow us to create and train several instances of the same neural network architecture. This provides us, for example, the opportunity to evaluate different initializations of the network parameters or train models using distinct datasets. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# implement the MNISTNet network architecture\n",
    "class MNISTNet(nn.Module):\n",
    "    \n",
    "    # define the class constructor\n",
    "    def __init__(self):\n",
    "        \n",
    "        # call super class constructor\n",
    "        super(MNISTNet, self).__init__()\n",
    "        \n",
    "        # specify fully-connected (fc) layer 1 - in 28*28, out 100\n",
    "        self.linear1 = nn.Linear(28*28, 100, bias=True) # the linearity W*x+b\n",
    "        self.relu1 = nn.ReLU(inplace=True) # the non-linearity \n",
    "        \n",
    "        # specify fc layer 2 - in 100, out 50\n",
    "        self.linear2 = nn.Linear(100, 50, bias=True) # the linearity W*x+b\n",
    "        self.relu2 = nn.ReLU(inplace=True) # the non-linarity\n",
    "        \n",
    "        # specify fc layer 3 - in 50, out 10\n",
    "        self.linear3 = nn.Linear(50, 10) # the linearity W*x+b\n",
    "        \n",
    "        # add a softmax to the last layer\n",
    "        self.logsoftmax = nn.LogSoftmax(dim=1) # the softmax\n",
    "        \n",
    "    # define network forward pass\n",
    "    def forward(self, images):\n",
    "        \n",
    "        # reshape image pixels\n",
    "        x = images.view(-1, 28*28)\n",
    "        \n",
    "        # define fc layer 1 forward pass\n",
    "        x = self.relu1(self.linear1(x))\n",
    "        \n",
    "        # define fc layer 2 forward pass\n",
    "        x = self.relu2(self.linear2(x))\n",
    "        \n",
    "        # define layer 3 forward pass\n",
    "        x = self.logsoftmax(self.linear3(x))\n",
    "        \n",
    "        # return forward pass result\n",
    "        return x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You may have noticed, when reviewing the implementation above, that we applied an additional operator, referred to as **'Softmax'** to the third layer of our neural network.\n",
    "\n",
    "The **softmax function**, also known as the normalized exponential function, is a function that takes as input a vector of K real numbers, and normalizes it into a probability distribution consisting of K probabilities. \n",
    "\n",
    "That is, prior to applying softmax, some vector components could be negative, or greater than one; and might not sum to 1; but after application of the softmax, each component will be in the interval $(0,1)$, and the components will add up to 1, so that they can be interpreted as probabilities. In general, the softmax function $\\sigma :\\mathbb {R} ^{K}\\to \\mathbb {R} ^{K}$ is defined by the formula:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center> $\\sigma (\\mathbf {z} )_{i}=\\ln ({e^{z_{i}} / \\sum _{j=1}^{K}e^{z_{j}}})$ </center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "for $i = 1, …, K$ and ${\\mathbf {z}}=(z_{1},\\ldots ,z_{K})\\in \\mathbb {R} ^{K}$ (Source: https://en.wikipedia.org/wiki/Softmax_function ). \n",
    "\n",
    "Let's have a look at the simplified three-class example below. The scores of the distinct predicted classes $c_i$ are computed from the forward propagation of the network. We then take the softmax and obtain the probabilities as shown:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"center\" style=\"max-width: 600px\" src=\"softmax.png\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The output of the softmax describes the probability (or if you may, the confidence) of the neural network that a particular sample belongs to a certain class. Thus, for the first example above, the neural network assigns a confidence of 0.49 that it is a 'three', 0.49 that it is a 'four', and 0.03 that it is an 'eight'. The same goes for each of the samples above.\n",
    "\n",
    "Now, that we have implemented our first neural network we are ready to instantiate a network model to be trained:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = MNISTNet()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once the model is initialized, we can visualize the model structure and review the implemented network architecture by execution of the following cell:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# print the initialized architectures\n",
    "print('[LOG] MNISTNet architecture:\\n\\n{}\\n'.format(model))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Looks like intended? Brilliant! Finally, let's have a look into the number of model parameters that we aim to train in the next steps of the notebook:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init the number of model parameters\n",
    "num_params = 0\n",
    "\n",
    "# iterate over the distinct parameters\n",
    "for param in model.parameters():\n",
    "\n",
    "    # collect number of parameters\n",
    "    num_params += param.numel()\n",
    "    \n",
    "# print the number of model paramters\n",
    "print('[LOG] Number of to be trained MNISTNet model parameters: {}.'.format(num_params))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, our \"simple\" MNISTNet model already encompasses an impressive number 84'060 model parameters to be trained."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.2 Specification of the Neural Network Loss Function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have implemented the **MNISTNet** we are ready to train the network. However, prior to starting the training, we need to define an appropriate loss function. Remember, we aim to train our model to learn a set of model parameters $\\theta$ that minimize the classification error of the true class $c^{i}$ of a given handwritten digit image $x^{i}$ and its predicted class $\\hat{c}^{i} = f_\\theta(x^{i})$ as faithfully as possible. \n",
    "\n",
    "Thereby, the training objective is to learn a set of optimal model parameters $\\theta^*$ that optimize $\\arg\\min_{\\theta} \\|C - f_\\theta(X)\\|$ over all training images in the MNIST dataset. To achieve this optimization objective, one typically minimizes a loss function $\\mathcal{L_{\\theta}}$ as part of the network training. In this lab we use the **'Negative Log Likelihood (NLL)'** loss, defined by:\n",
    "\n",
    "<center> $\\mathcal{L}^{NLL}_{\\theta} (c_i, \\hat c_i) = - \\frac{1}{N} \\sum_{i=1}^N \\log (\\hat{c}_i) $, </center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "for a set of $n$-MNIST images $x^{i}$, $i=1,...,n$ and their respective predicted class labels $\\hat{c}^{i}$. This is summed for all the correct classes. \n",
    "\n",
    "Let's have a look at a brief example:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"center\" style=\"max-width: 600px\" src=\"loss.png\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "During training the **NLL** loss will penalize models that result in a high classification error between the predicted class labels $\\hat{c}^{i}$ and their respective true class label $c^{i}$. Luckily, an implementation of the NLL loss is already available in PyTorch! It can be instantiated \"off-the-shelf\" via the execution of the following PyTorch command:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define the optimization criterion / loss function\n",
    "nll_loss = nn.NLLLoss()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Training the Neural Network Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section, we will train our neural network model (as implemented in the section above) using the transformed images of handwritten digits. More specifically, we will have a detailed look into the distinct training steps as well as how to monitor the training progress."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.1. Preparing the Network Training"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So far, we have pre-processed the dataset, implemented the ANN and defined the classification error. Let's now start to train a corresponding model for **20 epochs** and a **mini-batch size of 128** MNIST images per batch. This implies that the whole dataset will be fed to the ANN 20 times in chunks of 128 images yielding to **469 mini-batches** (60.000 images / 128 images per mini-batch) per epoch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# specify the training parameters\n",
    "num_epochs = 20 # number of training epochs\n",
    "mini_batch_size = 128 # size of the mini-batches"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Based on the loss magnitude of a certain mini-batch PyTorch automatically computes the gradients. But even better, based on the gradient, the library also helps us in the optimization and update of the network parameters $\\theta$.\n",
    "\n",
    "We will use the **Stochastic Gradient Descent (SGD) optimization** and set the learning-rate $l = 0.001$. Each mini-batch step the optimizer will update the model parameters $\\theta$ values according to the degree of classification error (the MSE loss)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define learning rate and optimization strategy\n",
    "learning_rate = 0.001\n",
    "optimizer = optim.SGD(params=model.parameters(), lr=learning_rate)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have successfully implemented and defined the three ANN building blocks let's take some time to review the `MNISTNet` model definition as well as the `loss`. Please, read the above code and comments carefully and don't hesitate to let us know any questions you might have."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Furthermore, lets specify and instantiate a corresponding PyTorch data loader that feeds the image tensors to our neural network:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mnist_train_dataloader = torch.utils.data.DataLoader(mnist_train_data, batch_size=mini_batch_size, shuffle=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.2. Running the Network Training"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we start training the model. The detailed training procedure for each mini-batch is performed as follows: \n",
    "\n",
    ">1. do a forward pass through the MNISTNet network, \n",
    ">2. compute the negative log likelihood classification error $\\mathcal{L}^{NLL}_{\\theta}(c^{i};\\hat{c}^{i})$, \n",
    ">3. do a backward pass through the MNISTNet network, and \n",
    ">4. update the parameters of the network $f_\\theta(\\cdot)$.\n",
    "\n",
    "To ensure learning while training our ANN model, we will monitor whether the loss decreases with progressing training. Therefore, we obtain and evaluate the classification performance of the entire training dataset after each training epoch. Based on this evaluation, we can conclude on the training progress and whether the loss is converging (indicating that the model might not improve any further).\n",
    "\n",
    "The following elements of the network training code below should be given particular attention:\n",
    " \n",
    ">- `loss.backward()` computes the gradients based on the magnitude of the reconstruction loss,\n",
    ">- `optimizer.step()` updates the network parameters based on the gradient."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init collection of training epoch losses\n",
    "train_epoch_losses = []\n",
    "\n",
    "# set the model in training mode\n",
    "model.train()\n",
    "\n",
    "# train the MNISTNet model\n",
    "for epoch in range(num_epochs):\n",
    "    \n",
    "    # init collection of mini-batch losses\n",
    "    train_mini_batch_losses = []\n",
    "    \n",
    "    # iterate over all-mini batches\n",
    "    for i, (images, labels) in enumerate(mnist_train_dataloader):\n",
    "        \n",
    "        # run forward pass through the network\n",
    "        output = model(images)\n",
    "        \n",
    "        # reset graph gradients\n",
    "        model.zero_grad()\n",
    "        \n",
    "        # determine classification loss\n",
    "        loss = nll_loss(output, labels)\n",
    "        \n",
    "        # run backward pass\n",
    "        loss.backward()\n",
    "        \n",
    "        # update network paramaters\n",
    "        optimizer.step()\n",
    "        \n",
    "        # collect mini-batch reconstruction loss\n",
    "        train_mini_batch_losses.append(loss.data.item())\n",
    "    \n",
    "    # determine mean min-batch loss of epoch\n",
    "    train_epoch_loss = np.mean(train_mini_batch_losses)\n",
    "    \n",
    "    # print epoch loss\n",
    "    now = datetime.utcnow().strftime(\"%Y%m%d-%H:%M:%S\")\n",
    "    print('[LOG {}] epoch: {} train-loss: {}'.format(str(now), str(epoch), str(train_epoch_loss)))\n",
    "    \n",
    "    # save model to local directory\n",
    "    model_name = 'mnist_model_epoch_{}.pth'.format(str(epoch))\n",
    "    torch.save(model.state_dict(), os.path.join(\"./models\", model_name))\n",
    "    \n",
    "    # determine mean min-batch loss of epoch\n",
    "    train_epoch_losses.append(train_epoch_loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Upon successfull training let's visualize and inspect the training loss per epoch:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# prepare plot\n",
    "fig = plt.figure()\n",
    "ax = fig.add_subplot(111)\n",
    "\n",
    "# add grid\n",
    "ax.grid(linestyle='dotted')\n",
    "\n",
    "# plot the training epochs vs. the epochs' classification error\n",
    "ax.plot(np.array(range(1, len(train_epoch_losses)+1)), train_epoch_losses, label='epoch loss (blue)')\n",
    "\n",
    "# add axis legends\n",
    "ax.set_xlabel(\"[training epoch $e_i$]\", fontsize=10)\n",
    "ax.set_ylabel(\"[Classification Error $\\mathcal{L}^{NLL}$]\", fontsize=10)\n",
    "\n",
    "# set plot legend\n",
    "plt.legend(loc=\"upper right\", numpoints=1, fancybox=True)\n",
    "\n",
    "# add plot title\n",
    "plt.title('Training Epochs $e_i$ vs. Classification Error $L^{NLL}$', fontsize=10);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, fantastic. The training error is nicely going down. We could train the network a couple more epochs until the error converges. But let's stay with the 20 training epochs for now and continue with evaluating our trained model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Evaluation of the Trained Neural Network Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before evaluating our model let's load the best performing model. Remember, that we stored a snapshot of the model after each training epoch to our local model directory. We will now load the last snapshot saved."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# restore pre-trained model snapshot\n",
    "best_model_name = \"mnist_model_epoch_19.pth\"\n",
    "\n",
    "# init pre-trained model class\n",
    "best_model = MNISTNet()\n",
    "\n",
    "# load pre-trained models\n",
    "best_model.load_state_dict(torch.load(os.path.join(\"models\", best_model_name)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's inspect if the model was loaded successfully: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set model in evaluation mode\n",
    "best_model.eval()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To evaluate our trained model, we need to feed the MNIST images reserved for evaluation (the images that we didn't use as part of the training process) through the model. Therefore, let's again define a corresponding PyTorch data loader that feeds the image tensors to our neural network: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mnist_eval_dataloader = torch.utils.data.DataLoader(mnist_eval_data, batch_size=10000, shuffle=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will now evaluate the trained model using the same mini-batch approach as we did throughout the network training and derive the mean negative log-likelihood loss of the mini-batches:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init collection of mini-batch losses\n",
    "eval_mini_batch_losses = []\n",
    "\n",
    "# iterate over all-mini batches\n",
    "for i, (images, labels) in enumerate(mnist_eval_dataloader):\n",
    "\n",
    "    # run forward pass through the network\n",
    "    output = best_model(images)\n",
    "\n",
    "    # determine classification loss\n",
    "    loss = nll_loss(output, labels)\n",
    "\n",
    "    # collect mini-batch reconstruction loss\n",
    "    eval_mini_batch_losses.append(loss.data.item())\n",
    "\n",
    "# determine mean min-batch loss of epoch\n",
    "eval_loss = np.mean(eval_mini_batch_losses)\n",
    "\n",
    "# print epoch loss\n",
    "now = datetime.utcnow().strftime(\"%Y%m%d-%H:%M:%S\")\n",
    "print('[LOG {}] eval-loss: {}'.format(str(now), str(eval_loss)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, great. The evaluation loss looks in-line with our training loss. Let's now inspect a few sample predictions to get an impression of the model quality. Therefore, we will again pick a random image of our evaluation dataset and retrieve its PyTorch tensor as well as the corresponding label:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set (random) image id\n",
    "image_id = 2000\n",
    "\n",
    "# retrieve image exhibiting the image id\n",
    "mnist_eval_image, mnist_eval_label = mnist_eval_data[image_id]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now inspect the true class of the image we selected:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mnist_eval_label"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, the randomly selected image should contain a one (1). Let's inspect the image accordingly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define tensor to image transformation\n",
    "trans = torchvision.transforms.ToPILImage()\n",
    "\n",
    "# set image plot title \n",
    "plt.title('Example: {}, Label: {}'.format(str(image_id), str(mnist_eval_label)))\n",
    "\n",
    "# plot mnist handwritten digit sample\n",
    "plt.imshow(trans(mnist_eval_image), cmap='gray')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, let's compare the true label with the prediction of our model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "best_model(Variable(mnist_eval_image))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can even determine the likelihood of the most probable class:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "torch.argmax(best_model(Variable(mnist_eval_image)), dim=1).item()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now obtain the predictions for all the handwritten digit images of the evaluation data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictions = torch.argmax(best_model(Variable(mnist_eval_data.data.float())), dim=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Furthermore, let's obtain the overall classifcation accuracy:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "metrics.accuracy_score(mnist_eval_data.targets, predictions.detach().numpy())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's also inspect the confusion matrix to determine major sources of misclassification"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# determine classification matrix of the predicted and target classes\n",
    "mat = confusion_matrix(mnist_eval_data.targets, predictions.detach().numpy())\n",
    "\n",
    "# plot corresponding confusion matrix\n",
    "sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False, cmap='YlOrRd_r', xticklabels=range(0,10), yticklabels=range(0,10))\n",
    "plt.title('MNIST classification matrix')\n",
    "plt.xlabel('[true label]')\n",
    "plt.ylabel('[predicted label]');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, we can easily see that our current model confuses the digits 3 and 5 as well the digits 9 and 4 quite often. This is not surprising since those digits exhibit a high similarity."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercises:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We recommend you try the following exercises as part of the lab:\n",
    "\n",
    "**1. Train the network a couple more epochs and evaluate its prediction accuracy.**\n",
    "\n",
    "> Increase the number of training epochs up to 50 epochs and re-run the network training. Load and evaluate the model exhibiting the lowest training loss. What kind of behaviour in terms of prediction accuracy can be observed with increasing the training epochs?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ***************************************************\n",
    "# INSERT YOUR CODE HERE\n",
    "# ***************************************************"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**2. Evaluation of \"shallow\" vs. \"deep\" neural network architectures.**\n",
    "\n",
    "> In addition to the architecture of the lab notebook, evaluate further (more shallow as well as more deep) neural network architectures by (1) either removing or adding layers to the network and/or (2) increasing/decreasing the number of neurons per layer. Train a model (using the architectures you selected) for at least 50 training epochs. Analyze the prediction performance of the trained models in terms of training time and prediction accuracy. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ***************************************************\n",
    "# INSERT YOUR CODE HERE\n",
    "# ***************************************************"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Lab Summary:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this lab, a step by step introduction into the **design, implementation, training and evaluation** of neural networks to classify images of handwritten digits is presented. The code and exercises presented in this lab may serves as a starting point for developing more complex, more deep and tailored **neural networks**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You may want to execute the content of your lab outside of the Jupyter notebook environment, e.g. on a compute node or a server. The cell below converts the lab notebook into a standalone and executable python script. Pls. note that to convert the notebook, you need to install Python's **nbconvert** library and its extensions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# installing the nbconvert library\n",
    "!pip3 install nbconvert\n",
    "!pip3 install jupyter_contrib_nbextensions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now convert the Jupyter notebook into a plain Python script:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!jupyter nbconvert --to script aiml_lab_08.ipynb"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "165px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
